AITopics | top-k sgd

Collaborating Authors

top-k sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Communication-efficientDistributedSGDwith Sketching

Neural Information Processing SystemsFeb-12-2026, 15:21:56 GMT

However,theoretical and empirical evidence both suggest that there is a maximum mini-batch size beyond which the number of iterations required toconvergestops decreasing, andgeneralization error begins toincrease [Maetal.,2017,Lietal., 2014, Golmant et al., 2018, Shallue et al., 2018, Keskar et al., 2016, Hoffer et al., 2017]. In this paper, we aim instead to decrease the communication cost per worker.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Trustworthiness of Stochastic Gradient Descent in Distributed Learning

Li, Hongyang, Wu, Caesar, Chadli, Mohammed, Mammar, Said, Bouvry, Pascal

arXiv.org Artificial IntelligenceOct-28-2024

DL is the method used to accelerate the training of deep learning models by distributing training tasks to multiple computing nodes [1]. However, as data scales continue to grow, the complexity of model gradients increases accordingly, for example, consider the training of deep learning on ImageNet [2], which contains over 14 million labeled images and topics with approximately 22,000 categories, leading to constraints on communication efficiency [3]. Gradient compression aimed at reducing communication overhead during gradient transmission between multiple nodes which enhances system computational efficiency [4, 5, 6], thus this has emerged as an effective optimization technique in distributed learning, especially when training complex models to process large-scale data. Among various gradient compression techniques, PowerSGD [6] and Top-K SGD [7] have emerged as prominent solutions for their ability to substantially reduce communication costs while preserving scalability and model accuracy in large-scale distributed learning. These two algorithms are particularly suitable for our study as they represent fundamental approaches to gradient compression: PowerSGD uses low-rank approximation, while TopKSGD leverages sparsification through threshold quantization. Both techniques are widely recognized for their practical effectiveness, especially when combined, to varying extents, with advanced features such as error feedback, warm start, all-reduce, making them ideal candidates of compressed SGD for assessing privacy risks in distributed deep learning systems.

artificial intelligence, machine learning, top-k sgd, (14 more...)

arXiv.org Artificial Intelligence

2410.21491

Country: Europe (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

Yoon, Daegun, Oh, Sangyoon

arXiv.org Artificial IntelligenceSep-18-2022

To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2209.08497

Country:

Asia > South Korea > Gyeonggi-do > Suwon (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback